Information retrieval on mixed written and spoken documents
نویسندگان
چکیده
While advances have been made in structuring, indexing and retrieval of multimedia documents, we propose to study the unexplored problematics of information retrieval on heterogeneous media sets composed of written and spoken documents. The coverage of modalities in retrieved results seems to be an important part of the user’s information need. We show that this problematic is not satisfied by the usual bag-of-words models and propose a method to balance modalities within the query expansion process of the probabilistic model. As there has never been experiments in this domain, we suggest that building evaluation data for the addressed medias (text and speech) as well as other medias (image...) is important for the multimedia information retrieval community.
منابع مشابه
Written versus spoken queries: A qualitative and quantitative comparative analysis
This paper reports on an experimental study on the differences between spoken and written queries. A set of written and spontaneous spoken queries are generated by users from written topics. These two sets of queries are compared in qualitative terms and in terms of their retrieval effectiveness. Written and spoken queries are compared in terms of length, duration, and part of speech. In additi...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملEfficient Interactive Retrieval of Spo Ranked by Reinforcem
Unlike written documents, spoken documents are difficult to display on the screen; it is also difficult for users to browse these documents during retrieval. It has been proposed recently to use interactive multi-modal dialogues to help the user navigate through a spoken document archive to retrieve the desired documents. This interaction is based on a topic hierarchy constructed by the key ter...
متن کاملSpeech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents
The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...
متن کاملOntology based Semantic Annotation of Urdu Language Web Documents
Proliferation of multilingual text on the Internet has increased the demand for efficient information retrieval independent of language. Among variety of languages, the Urdu language is one of the most commonly spoken and written language in South Asia. However, due to unstructured format the access of relevant information is still a big challenge. The semantic web technologies enable the advan...
متن کامل